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Abstract 



Q\ . In this paper we look at a class of random optimization problems. We discuss ways that can help 

determine typical behavior of their solutions. When the dimensions of the optimization problems are large 
such an information often can be obtained without actually solving the original problems. Moreover, we 
also discover that fairly often one can actually determine many quantities of interest (such as, for example, 
the typical optimal values of the objective functions) completely analytically. We present a few general ideas 
O ' and emphasize that the range of applications is enormous. 

Index Terms: Linear constraints; duality. 

in 

(N ! 1 Introduction 

We start by looking at a class of very simple optimization problems. Namely, we will look at a linearly 
constrained optimization problems. Such problems can be formulated in the following fairly general way: 

miu /( x ) 

X 

subject to Ax. = 

hi ■ 5x < 0, (1) 

for concreteness we will assume that A is an m\ x n matrix from R m ^ n and B is an rri2 x n matrix from 
R m2Xn . Also, it is rather clear, but we still mention that /(x) : R n — > R, is what we will call the objective 
function. Also if one looks at the problem given in (1) the first thing that comes to mind is that it is a 
linearly constrained optimization problem (see, e.g. [1]). So, there is really nothing specific about it beyond 
that depending on the type of function /(x) its objective value could be either bounded or unbounded and 
the problem can be either feasible or not. To make the exposition easier we will assume that whenever 
something in our exposition can be such that the objective could be unbounded or even nonexistent then 
such a scenario is not the subject of our discussion in this paper. Or in other words, we will assume that we 
look only at the scenarios where the objective values can be computed and are properly bounded. Another 
alternative would be, if say the problem above is unbounded, to simply add constraints that would insure 
boundedness of the objective value; or on the other hand, if the problem above is say infeasible, to simply 
remove some of the constraints until it becomes feasible. We will occasionally throughout the paper look at 
a few scenarios where we would need to force the boundedness. However, since there will be those where 
we will ignore it we simply preface it right here before we proceed with further presentation. 
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Now, going back to the optimization problem given in (1). Determining the solution of this problem 
and the optimal value of its objective function is of course the ultimate goal. The type of function /(x) 
is typically what determines if this problem can be solved in polynomial time or not. For a moment let 
us assume that /(x) is such that (1) can be solved in polynomial time (in this paper whenever we say 
polynomial time, we mean it roughly speaking, i.e. without all the details related to what is typically in the 
complexity theory called strongly polynomial and all other subtleties that come with considerations similar 
to that). From an algorithmic point of view the above problem is then typically considered as solvable. Our 
interest in this paper will be slightly different from this classical approach. We will look at a class of these 
problems and discuss whether or not is it possible to analytically determine the optimal value of the objective 
function. Of course, if the dimension of the problem, n, is small (say n = 2 or n = 3) it is highly likely (no 
matter how complicated /(x) can be) that (1) can be solved analytically. As one may guess our interest will 
not be in such small dimensional scenarios either. Instead we will typically look at large values of n and all 
other dimensions. Moreover, to facilitate writing, we will typically assume the so-called linear regime, i.e. 
we will assume that all dimensions in this paper are large but linearly proportional to n. For example, in (1) 
we will assume that mi = oi\n and mi = aim where both, ot\ and ai are constants independent of n. 

Now, if the dimensions in (1) are large and our goal is to solve it analytically how exactly do we plan 
to go about it. Well, there is really not much we would be able to say right away for two reasons: 1) we 
have not specified /(x) and dealing with an unspecified /(x) could be unpredictable and in fact quite often 
impossible; 2) the dimension of the problem is large which means that the number of constraints is large 
as well and moreover in a general setup that we assume they all act on all components of x, i.e. on all 
{xi, X2, . . . , x n }. While we will not change much in our specifications of /(x) we will look for a glimmer 
of hope in a particular type of constraints. In other words we will leave the first of the above reasons aside 
and try to deal with the second one hoping that that alone will introduce enough simplifications so that 
eventually even the first reason is not that much of a problem. There are many ways how one deal with 
sets Ax. = 0, Bx < 0. Our approach will be a random one. More specifically, we will assume that the set 
of constraints is drawn from a probability distribution. Since matrices A and B essentially determine the 
constraints we will assume that they are the objects that are random. Moreover, to make the presentation 
easier and to introduce a bit more of concreteness we will also assume that all components of both, A and 
B, are i.i.d standard normal random variables. This effectively establishes problem in (1) as a random 
optimization problem and that is the class of the optimization problems that will be the subject of our study 
in this paper. Fairly often, in the theory of random optimization problems one looks at the objective values 
that are also random functions of unknown x. Our entire exposition can easily be adapted to encompass 
such a scenario as well. However, we find it easier from the presentation point of view to assume that /(x) 
is actually a deterministic function. 

While there is a quite large literature on studying algorithmic aspects of random optimization problems 
we stop short of reviewing it here. The main reason is that here we are not interested in a specific instance of 
a certain optimization problem but rather a large class of optimization problems and it would be fairly hard 
to cover all the relevant work without missing some specific portions of it. We do however mention that 
the problems we study here are very generic and the literature on any of its particular instances would be a 
solid subreview. We also mention that our exposition does not rely on using any of the results known for any 
specific instance. In that sense the reader is not really even required to have pretty much any background 
in optimization theory beyond a few classical concept that will be rather obvious from our presentation. 
Moreover, any such concepts will be fairly general and not tailored in any way for the classes of problems 
we study here. 

Now that we have a setup of the introductory problem that we will look at we briefly describe what 
we will present in the rest of the paper and how the paper will be organized. In Section 2 we look at 
problem (1) in the above mentioned random context and present several observations that can be useful in 
analytically studying typical probabilistic behavior of the solutions of such problems. In Section 3 we then 
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study two more general versions of the original problem (1), namely nonhomogeneous linear constraints 
and additional functional constraints. In Section 4 we then look at several particular objective functions and 
present in details how the mechanisms of Section 3 work. In Section 5 we give a brief discussion and present 
several conclusion related to presented results. 

2 Random linearly constrained programs 

In this section we look at problem (1) in a statistical scenario. As mentioned above, for concreteness we 
assume that in 

af,A,B)=mm /(x) 

X 

subject to Ax = 

Bx < 0, (2) 

all components of matrices A and B are i.i.d. standard normals. Since the assumed scenario is random 
we also need to revisit one of the assumptions we have made right after (1). Namely, we stated that we 
will ignore all situations where the objective function is unbounded. Given the statistical scenario we will 
slightly modified such a statement by saying that we will assume that the objective in (2) is bounded with 
overwhelming probability (under overwhelming probability we in this paper assume a probability that is no 
more than a number exponentially decaying in n away from 1). Under such an assumption we then proceed 
with the following transformation of (2) 

£(f,A,B) = minmax /(x) + v T Ax + X T Bx 

x u,X 

subject to Aj > 0,« = 1,2, ... ,7712- (3) 

In the rest of this section we will present a strategy that can be helpful in obtaining a probabilistic view of 
quantity £(/, A, B). We will split the presentation in two parts. In the first part we will present a lower 
bound type of strategy whereas in the second part we will present an upper bound type of strategy. 

2.1 Lower-bounding strategy 

We will invoke the results of the following lemma which is a slightly modified version of Lemma 3.1 
from [3] (which is a direct consequence of Theorem B from [3]). 

Lemma 1. Let A be an m\ x n matrix with i.i.d. standard normal components and let B be an x n matrix 
with i.i.d. standard normal components. Let g and h be n x 1 and (m\ + 771,2) x 1 vectors, respectively, 
with i.i.d. standard normal components. Also, let g be a standard normal random variable. Then 

P(min max(v T Ax + \ T Bx+\\ \u T \ T ] || 2 ||x||2« - d'l J > 0) 

> P(min max(|| [v T \ T ] || 2 g T x + ||x|| 2 h T [u T X T f - C^ A ) > 0). (4) 

Proof. The proof follows from Theorem B from [3] after a fairly obvious modification of the proof of 
Lemma 3.1 given in [3]. □ 

Let Cxi x = e 5^V^ 1 1 [^ T A T ] H2IMI2 — /( x ) + w i*h e 5^ > being an arbitrarily small constant 

independent of n and £d (/ ) being a fixed number that we will discuss later in great detail. Also, let 
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h = [h^h^] T , where is the first mi components of h and h# is the last 771,2 components of h. We will 
first look at the right-hand side of the inequality in (4). The following is then the probability of interest 




Before further looking at this probability we will look in a bit more detail at the optimization problem inside 
the probability. We first denote 

L = minmax(|| [v T \ T ] || 2 g T x + ||x|| 2 h T [u T X T ] T - A ). (6) 
Replacing the value of A we further have 

L = minmax(|| [^ T A T ] || 2 g r x + ||x|| 2 h r [z/ r A T ] T - <£>„ >A ) 
= mmmax(|| [v T \ T ] || 2 g T x + ||x|| 2 h T [u T X T f - ef^W [^ T A T ] || 2 ||x|| 2 + /(x) - $(/). (7) 

One can now do the inner maximization for a fixed x and fixed || [i/ T A T ] || 2 . We then get 

L = minmax(|| [u T X T ] || 2 g T x+||x|| 2 || [u T X T f hJ\\h A \\* + jjh^jj|-cW^II [^^] || 2 ||x|| 2 +/(x)-^(/), 

(8) 

where he + is the vector comprised only of non-negative components of he- To make sure that L remains 
bounded we further have 

L = min /(x)-$(/) 

subjectto g T x+ ||x|| 2 ||(^/||h A ||2 + \\h B+ \\ 2 2 - ei g) V^) < 0. (9) 
Combining (5), (6), and (9) one then has for the left hand side of (4) 

P(min max (|| [is T \ T ] || 2 g T x + ||x|| 2 h T [u T \ T ] T - C^ A ) > 0) = P(L > 0), (10) 

with L given in (9). Since h A is a vector of mi i.i.d. standard normal variables and he is a vector of m 2 
i.i.d. standard normal variables it is rather trivial that 

P(J\\h A \\l + \\h B+ \\l > (1 - e^Wmi+m^) > 1 - e -4"Vi 

where > is an arbitrarily small constant and is a constant dependent on ef^ but independent of 
n. Then one can modify (9) and (10) in the following way 

P(min max (|| [v T \ T ] || 2 g T x + ||x|| 2 h T [v T \ T ] T - > 0) = P(L > 0) 

> (1 - e -4 m) (™i+™2/2)) P ( L (i) > o), (ii) 
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where is 

L( 1 )=min /(x)-^(/) 

X 

subject to g T x + ||x|| 2 ((l - e ( { n) )y / m 1 + m 2 /2 - ejf } V™) < 0. (12) 
We now look at the left-hand side of the inequality in (4). 

P(min max(z/ T ,4x + A T £x + II \v T \ T ] || 2 ||x|| 2£ j ~ (21 J > 0) 

x \>0,u L J x,i/,a/ 

= P(min max(i/ r ix + X T Bx + /(w) - ^\f) + \\ [v T \ T ] || 2 ||x|| 2 ( 5 - e^Vn)) > 0). (13) 

since P(g < e^y/n) > 1 - e~ € (> 9>n (where is, as all other e's in this paper are, independent of n) from 
(13) we have 

P(min max(/Ax + A T £x + II \v T \ T ] || 2 ||x|| 2 o - d°„ J > 0) 

x \>0,v L J x,i/,a/ 

< (1 - e- e 6 9)n )P(min max (u T Ax + A T £x + /(w) - f § (/)) > 0) + e^V (14) 

x A>0,f 

Connecting (4), (10), (11), and (14) we obtain 

P(min max(z/ T ,4x + A T £x + f(w) - fi^f)) > 0) > 
x A>0,i/ ^ ~~ ~~ 

— 55 ^ (1) >0)- — — , (15) 

(1 _ e -4 9J «) (1 - e~ e e> n ) 

where is as given in (12). A further combination of (3) and (15) gives 

(m) (g) 

P(af,A,B) - $(/) > 0) > { - e —^-P( L W > 0) - — ? — . (16) 

(i _ e -4 9, «) (i - e -4 9, «) 

We are now in position to state the following lemma which is the first of results that we will present that 
relates to the optimal value of the objective of (2). 

Lemma 2. (Lower bound) Let A be an m\ x n matrix with i.i.d. standard normal components. Let B be 
an m 2 x n matrix with i.i.d. standard normal components. Assume that n is large and that m\ = ct\n 
and ni2 = a 2 n where ot\ and a 2 are constants independent of n. Let /(x) : R n — >■ R be a given function 
and let £(/, A, B) be the objective value of the optimization problem in (2). Assume that /(x) is such that 
\£(f,A,B)\ < oo with overwhelming probability. Further let g be an n x 1 vector with i.i.d. standard 
normal components. Let e's in (12) be arbitrarily small constants and let be the largest scalar so 

that defined in (12) is non-negative with overwhelming probability. Then, 

lim P(C(/,A,B)>eg ) (/)) = l. 

n— >oo 

Proof. Follows from the previous discussion. □ 
While the above lemma may sound a bit dry it is often a fairly powerful tool to deal with random linearly 
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constrained programs. Its power essentially lies in potential simplicity of the auxiliary optimization program 
(12). It is relatively easy to see that the optimization problem in (12) is substantially simpler than the original 
one given in (2). Still, there is no guarantee that (12) is always solvable. That would certainly depend on the 
structure of function /(x). Also, not only that (12) needs to be solvable, one should also be able to show 
that its solution behaves "nicely", i.e. one should be able to find a quantity ^ that is almost certain to be 
smaller than the optimal value of the objective of (12). We will towards the end of the paper demonstrate 
how the results of this lemma can be used in practice on a small example. The key in such an example (as 
well as in any example where the above lemma is to be of any use) will be ability to probabilistically handle 
much simpler program (12). Before proceeding with further generic considerations of (2) we will in the 
next subsection we present a corresponding upper-bounding strategy for a probabilistic characterization of 
the objective in (2). 

2.2 Virtual upper-bounding strategy 

Before we proceed with the detail presentation we should make more explicit the following point. Namely, 
what we presented in the previous subsection is a concept that is mathematically speaking always correct, 
i.e. as long as the problem in (2) is deterministically solvable and its solution probabilistically speaking 
bounded. Now, while the concept is correct it is just a lower bound type of approach. Moreover, the concept 
is correct and it relies on a potential simplicity of (12). So it will be useful if (12) can be handled. However, 
no matter if (12) can be handled or not, the entire concept remains valid with very minimal assumptions on 
/(x) (in fact, assumption that /(x) is such that (2) is bounded seems as pretty much unavoidable as long 
as solving (2) is to have any reasonable practical sense). On the other hand the strategy that we will present 
below will not work genetically, i.e. it will require additional assumptions on /(x) beyond those mentioned 
in the previous subsection. Since these assumptions may or may not hold we will preface our presentation 
by saying that the upper-bounding strategy that we show below is in a way a virtual strategy. Of course, 
we should add that the strategy is not purely virtual. Quite contrary, it fairly often works; in fact, roughly 
speaking, it works almost exactly as the complexity theory works, i.e. it is fairly similar to the following 
paradigm "as long as (2) is computationally doable in a reasonable amount of time the strategy will be 
working well". Of course, this is a fairly informal statement without any mathematically rigorous type of 
language. To establish the above statement on a more mathematically rigorous level requires a presentation 
that goes way beyond the scope of this paper and will pursue it elsewhere. We do mention that such a 
presentation does not contain almost any further conceptual insight, i.e. the core of the ideas is already here. 
However, it does require an enormous amount of mathematical detailing which we choose to skip to avoid 
ruining the elegance of the presentation that we attempt to achieve here. 

Going back to (2), in this section we will essentially attempt to mimic the presentation of the previous 
subsection. To that end we start by recalling that our object of interest is the following linearly constrained 
optimization problem: 

af,A,B)=mm /(x) 

X 

subject to Ax = 

fix < 0, (17) 

which after a bit of juggling becomes 

£(/, A, B) = min max /(x) + v T Ax + \ T Bx. (18) 

x \>0 t v 

Now, we recall that A and B are random matrices and the above optimization problems are random. Given 
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their randomness sometimes they can be solvable sometimes they may not be solvable. They may be unsolv- 
able due to the fact that they are not feasible or that they are feasible but the value of the objective function is 
unbounded. However, as we did in the previous subsections, we leave all these unfavorable scenarios aside 
and preface our presentation assuming that A,B)\ < oo with overwhelming probability. 

A this point we will attempt to transform (18) assuming that /(x) is such that the transformation is 
mathematically possible. Namely, let /(x) be such that 

£(f,A,B) = min max /(x) + i/ T Ax + X T Bx 

x \>0,v 

= max min /(x) + u T Ax + X T Bx. (19) 

\>0,u x 

Assuming that (19) holds one can then further write 

£(/, A, B) = max min /(x) + u T Ax. + X T Bx 

\>Q,v x 

= - min max -fix) - /ix - X T Bx (20) 

and 

-£(/, A, B) = min max -/(x) - v 7 Ax - X T Bx. (21) 

\>0,v x 

Similarly to what was done in the previous subsection we will utilize the results of the following lemma 
which is a slightly modified version of Lemma 3.1 from [3] and an upper-bounding analogue to lower- 
bounding Lemma 1. 

Lemma 3. Let A be an m\ x n matrix with i.i.d. standard normal components and let B be an m 2 x n matrix 
with i.i.d. standard normal components. Let g and h be n x 1 and (m\ + 7712) x 1 vectors, respectively, 
with i.i.d. standard normal components. Also, let g be a standard normal random variable. Then 

P( min max(-/ix - X T Bx + II \u T \ T ] hUhg - &l J > 0) 

\>0,u x J X '^> A 

> P( min max(|| [v T \ T ] || 2 g T x + ||x|| 2 h T [u T X T f - C^ A ) > 0). (22) 



Proof. The proof follows from Theorem B from [3] after a fairly obvious modification of the proof of 
Lemma 3.1 given in [3]. □ 

L et Cx"i a = e ^ V™\\ [^ T A T ] H2 1 1 x| 1 2 + /(x) — (/) w i tn e 5 S ' ) > being an arbitrarily small constant 
independent of n and £p\f) being a fixed number that we will discuss later in great detail. As in the 
previous subsection, let h = [h^h^] T , where is the first mi components of h and is the last m 2 
components of h. As in the previous subsection we will first look at the right-hand side of the inequality in 
(22). The following is then the probability of interest 

P(min max(|| [u T X T ] || 2 g T x + ||x|| 2 h r [u T X T ] T ' - c£!,a) ^ °)- ^ 

Before looking further at this probability we will look in a bit more detail at the optimization problem inside 
the probability. We first denote 

U= minmax(|| [u T \ T ] || 2 g T x + ||x|| 2 h T [u T X T ] T - £1 J. (24) 
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(u) 

Replacing the value of Q ' v x we further have 



U = min max(|| [z/ T A T ] || 2 g r x + ||x|| 2 h r [^ T A T ] T - <£> >A ) 



(u) 

K,U,. 

= min max(|| [v T \ T ] || 2 g T x + ||x|| 2 h T [v T \ T f - ejf^H [v T \ T ~\ || 2 ||x|| 2 - /(x) + #>(/)). (25) 



From (25) one then has 

i T [i^f-e^ll [^A r ] || 2 ||x|| 2 -/(x) + 4"^ 
^ [iWf- c W^|| [^] || 2 ||x|| 2 -/(x) + 4 u )( 

(26) 

One can now do the inner maximization for a fixed x and fixed || [i/ T A T ] || 2 to get 



U = min max(|| [v T \ T ] || 2 g T x + ||x|| 2 h T [u T X T f - e^|| [^ T A T ] || 2 ||x|| 2 - /(x) + 4%)) 

A^O,^ x 

> max min (|| [u T X T ] || 2 g T x + ||x|| 2 h T [^A T ] T - e^|| [^A r ] || 2 ||x|| 2 - /(x) + #>(/)) 

x A^O,^ 

= -mmmax(-|| [^A r ] || 2 g T x - ||x|| 2 h r [^A r ] T + 6^^|| KA T ] || 2 ||x|| 2 + /(x) - 



[/> - min max (-11 [V T A T ] || 2 g T x+||x|| 2 || [v T \ T Y || 2 J||h A ||| + ||h B+ |||+ef v^ll [^ T A T ] || 2 ||x|| 2 +/(x)-^(/)), 

x A^O,/^ * 

(27) 

where as in the previous subsection he+ is vector comprise of only non-negative components of he- To 
make sure that the quantity on the right-hand side remains bounded we further have 

C/>-min /( x ) -£«(/) 

X 

subjectto -g T x+ ||x|| 2 ||(^/||h||2 + HhB+lll + e^^) < 0. (28) 



Let 



C/(°)=min /(x)-$(/) 



subjectto -g T x+ ||x|| 2 ||(^/||h||| + ||h B +||l + eJ' ) >/n) < 0. (29) 
Combining (23), (24), and (28) one then has for the left hand side of (22) 

P(min max (|| [u T \ T ] || 2 g T x+||x|| 2 h T [^ T A T ] T -C^ A ) > 0) = P{U > 0) > P(-U<® > 0) = P{U® < 0), 



Mil'' I: T a |i • i* — ' 

x \>0,v 

(30) 

with C7(°) as given in (29). Since h/i is a vector of m\ i.i.d. standard normal variables and he is a vector of 
m 2 i.i.d. standard normal variables it is rather trivial that 



P(^||h||2 + ||h B+ ||2 < (1 + e ( ™ ) Wm 1 +m 2 /2) > 1 - e -4"Vi 
where we recall that as in the previous subsection ef^ > is an arbitrarily small constant and e 2 m ' ) is a 
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constant dependent on but independent of n. Then one can modify (29) and (30) in the following way 

P(minmax(|| [» T X T ] || 2 g T x + ||x|| 2 h T [u T X T f - C^ A ) > 0) > P(U^ < 0) 

> (1 - e -4 m) (™i+"W2))p(^(i) < ) ) (31) 

where is 

C/( 1 )=min /(x) -#>(/) 

subject to -g T x + ||x|| 2 ((l + 4 m) ) y /m 1 + m 2 /2 + ef ] ^) < 0. (32) 

We now look at the left-hand side of the inequality in (22). Essentially we will just need to repeat the 
corresponding arguments from the previous subsection. A few notational modifications will be in place 
though. We start with 

P( min max(-/ix - X T Bx + II \u T X T ] h\\x\\ 2 g - &1 x) > 0) 

= P(min max(-i/ r ix- X T Bx- /(w) +^\f) + || [u T X T ] || 2 ||x|| 2 ( 5 - e^y^)) > 0). (33) 

Since P(g < e { f y/n) > 1 - e~ e 6 n (where is, as all other e's in this paper are, independent of n) from 
(33) we have 

P( min max(-/ix - X T Bx + II \is T X T ] || 2 ||x|| 2 o - &l J > 0) 

\>0,v x L J x,j/,a> 

< (1 _ e -4 9) ")p( min max (-/Ax - X T Bx - /(w) + #(/)) > 0) + e"^". (34) 

A>0,^ x 

Connecting (22), (30), (31), and (34) we obtain 



P{ min max(-/ix - X T Bx - /(w) + > 0) > 

\>0,u x 

* a ^P(trW<0) — , (35) 

(1 _ e -4 9, «) (1 - e -4 9, «) 

where f/^ 1 ) is as given in (32). A further combination of (18) and (35) gives 

(m) „ (9) 

P(-£(/, + #>(/) > 0) > * < 0) — • (36) 

(1 _ e -4 9, «) (1 - e -4 9, «) 

We are now in position to state the following lemma which is a result that helps create an upper-bound 
on the optimal value of the objective of (2). 

Lemma 4. {Virtual upper bound) Let A be an m\ x n matrix with i.i.d. standard normal components. Let 
B be an m 2 x n matrix with i.i.d. standard normal components. Assume that n is large and that m\ = ci\n 
and m 2 = a 2 n where ot\ and a 2 are constants independent of n. Let /(x) : R n — ^ R be a given function 
and let £(/, A, B) be the objective value of the optimization problem in (2). Assume that /(x) is such that 
\£(f,A,B)\ < 00 with overwhelming probability and that (19) holds. Further let g be an n x 1 vector 
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with i.i.d. standard normal components. Let e's in (32) be arbitrarily small constants and let (/) be the 



As was the case with Lemma 2, Lemma 4 may also sound a bit dry. However, as we mentioned right 
after Lemma 2, Lemma 4 often turns out to be a fairly powerful tool to deal with random linearly constrained 
programs. Its major power essentially lies in potential simplicity of the auxiliary optimization program (32) 
(of course excluding a couple of technical details this program is for all practical purposes the same as the 
one given in (12)). On the other hand one should keep in mind that Lemma 4 is a bit more restrictive in 
that it also requires that /(x) is such that (19) holds. If one for a moment leaves aside this restriction then 

(u) 

the power of the above lemma pretty much relies on one's ability to determine a quantity £ D that is almost 
certain to be larger than the optimal value of /(x) in (32). Of course the smaller £^ the better the bound. In 
a more informal language though, if a duality in (19) holds and if everything else (probabilistically speaking) 
behaves "nicely" the success of the above introduced mechanism relies on one's ability to provide a precise 
probabilistic analysis of (12) or (32). That is typically highly likely to be possible given that the optimization 
program (12) (or (32)) has only one random linear constraint. 

3 More sophisticated optimization programs 

What we presented in the previous section is an often very powerful mechanism to handle linearly con- 
strained optimization programs. One then naturally may wonder is there a way to extend the above results 
to more general classes of optimization problems. The answer is yes, but in our experience such extensions 
are typically problem specific. That is of course one of the reasons why we presented the main concepts on 
a very simple optimization problem. Instead of listing various other types of problems where the mecha- 
nism presented here can be used equally successfully we below choose to discuss a few small modifications 
which will hopefully provide a hint as to how relatively easily the whole framework can be massaged to fit 
into various other scenarios. All these modifications could have been already included in our original setup. 
However, we thought that they would make the original problem unnecessary cumbersome and in order to 
preserve the lightness of the exposition we chose to start with the simplest possible example and then build 
from there. 

3.1 Non-homogeneous linear constraints 

Looking back at problem (1) one can notice that we started with a set of constraints that is basically homo- 
geneous, i.e. pretty much scaling invariant. In other words for any x that is feasible in (1) cx is feasible as 
well as long as c > 0. Typically linear constraints are not necessarily homogeneous and if they are not one 
has the following more general version of (1) 



Proof. Follows from the previous discussion. 



□ 



mm 



/(x) 



x 



subject to 



Ax = a 



fix < b, 



(37) 
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where a is an m\ x 1 vector from R mi and analogously b is an 777-2 x 1 vector from R m2 . Now, given 
that in this paper we are dealing with random programs, it is natural to wonder if a and/or b are random or 
deterministic (fixed). We will below just sketch how our results easily adapt if a and b are deterministic. 
Essentially, one can pretty much repeat the entire derivation from the previous section. Namely, one can 
start by defining the optimal value of the objective in (37) as 



^(/,i,B)=min /(x) 

X 

subject to Ax = a 



t3x < b, (38) 



and write an analogue to (3) 



£ nh (f, A, B) = min max fix) + v 1 Ax + X 1 t3x + v 1 a + A b. (39) 

x \>0,i> 

One can then repeat the entire definition from the previous section with very minimal and fairly obvious 
modifications. We skip such an exercise but mention only the critical differences and final results. The 
only difference in the entire derivation will be the form of the auxiliary programs (12) (or (32)). Since (12) 
(and (32)) are a more refined version of (9) (and (29)) what will actually change is the structure of these 
programs. So instead of them one would have 



subject to g T x + yllllxl^hA + a||| + ||(||x|| 2 h s + b) + \\l - \\x\\ 2 e { 5 9 ' < 0, (40) 

where (||x|| 2 hB + b) + is a vector comprised of non-negative components of vector ||x|| 2 hB + b. On the 
other hand one would have for a corresponding replacement of (29) 

^=mjn /(x) -#>(/) 



subject to -g T x+ \\x\\ 2 h A + a||| + ||(||x|| 2 h B + b)+||| + ||x|| 2 e£ 9; V™ < 0. (41) 

Of course, for all practical purposes programs (40) and (41) are basically equivalent. Statement of Lemma 
2 would then remain in place with the only difference being that L^> should be replaced by L^. Similarly, 

Lemma 4 would remain correct with being replaced by and with an /(x) being such that the 
following modified version of (19) holds 

£ nh (f,A,B) = min max /(x) + u T Ax + X T Bx + u T a+ A T b 

x \>0,u 

= max min f(x) + v T Ax. + X T Bx + u T 3l+ \ T h. (42) 

\>0,u x 

What we presented above is a generic scenario that would work for any given a and b. Even when a and 
b are generic, one can of course massage it further and remove the randomness of h as in the definitions of 
L^ 1 ) and (when a and b are random this is even easier). We skip these easy exercises. 

3.2 Additional functional constraints 

What we discussed above is an upgrade in the existing set of constraints. Instead one may wonder how 
mechanism would fare if the linear structure of constraints would be changed to include more general con- 
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straints. For example instead of (1) one may look at its a more general version 



min /(x) 

X 

subject to Ax. = 
5x < 

/ i (x)<0,i = l,2,...,Z (43) 

where each /, (x) : R n — > R is a non-necessarily linear function of x (of course, there is really no need to 
restrict on scalar functions; i.e. all the major steps that we present below can be repeated/extended to pretty 
much any kind of function). Similarly to what we discussed in the previous subsection, these functions can 
be random of deterministic. To make writing easier we will assume that they are generic, i.e. deterministic. 
One can then again proceed as above by introducing 

Zafdf, fi, h, ■ ■ ■ , fi,A, B) = min /(x) 

subject to Ax = 
5x < 

/ i (x)<0,i = l,2,...,i (44) 



and writing an analogue to (3) 

i 

Zafc(f,h,f2,...,fi,A,B) =min max /(x) + v T Ax + X T Bx + V 7i / i (x). (45) 

x A>0,7i>0,^ f— * 

i=l 

One can again then repeat the entire derivation from the previous section with very minimal modifications. 
As in the previous subsection, we skip such an exercise and only mention the critical differences and final 
results. As was the case above when we discussed the non-homogeneous linear constraints, the only differ- 
ence in the repeated derivation will be the form of the auxiliary programs (12) (or (32)). So instead of them 
one would have 

4^ = min /(x) 

subject to g T x + ||x|| 2 ((l - e ( j m) )y / m 1 +m 2 /2 - e^y/n) < 

/ i (x)<0,i = l,2,...,Z. (46) 



On the other hand one would have for a corresponding replacement of (29) 

afc 



C/«=min /(x )-4%) 



subject to -g T x+ ||x|| 2 ((l + e[ m V»ni + m 2 /2 + y/n) < 

/ i (x)<0,i = l,2,...,i. (47) 

Of course, for all practical purposes programs (46) and (47) are basically equivalent. Statement of Lemma 2 
would then remain in place with the only difference being that should be replaced by L^j c . Similarly, 

Lemma 4 would remain correct with being replaced by uj^ c and with an /(x) being such that the 
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following modified version of (19) holds 

Z afc {f,A,B) = min max /(x) + /ix + A r 5x + V 7i /i(x) 

x A>0,7i>0,i^ ^— ' 

i=l 

I 

= max min f(x) + jy T Ax + A T 5x+ y^7jfj(x). (48) 
A>0, 7i >0,i/ x f-' 

What we presented above is a generic scenario where all functions fi (x) are assumed to be deterministic. 
Of course some of the additional constraints (sometimes even all of them) can be random functions as well. 
Then they typically can be massaged further, either when handling (46) (or (47)) or in the derivation process 
from Section 2. However, the way to handle them is typically problem specific and we typically treat them 
on the individual case basis and choose to present such discussions elsewhere. 

It is of course relatively easy to see that the non-homogenous case from the previous subsection and 
the case of additional functional constraints considered in this subsection can easily be merged. We of 
course skip rewriting this easy exercise. Instead in the following section we provide a specific example 
to demonstrate how the entire mechanism can be applied. Moreover, the example will be selected so that 
the mechanism works in its full capacity, i.e. with all assumptions being satisfied and both Lemma 2 and 
Lemma 4 being useful and essentially providing matching lower and upper bounds on the optimal value of 
the objective function. 

4 An example: homogeneous f(x) with spherical bounding constraint 

In this section we demonstrate how the mechanism from previous sections can be applied on a particular 
optimization problem. We start by assuming a specific type of the objective function. We will assume that 
/(x) is a homogeneous function. Namely, let /h(x) be such that 

A (ax) = a d A(x), (49) 

for any a > and a fixed d > 0. Then we say that function /^(x) is positive homogeneous of degree d. 
Then for all practical purposes the optimization problem (1) is useless. Basically, if there is a feasible x 
such that fh (x) < one can then keep multiplying such an x by a sequence of arbitrarily large increasing 
constants a and no matter how small d is the value of /^(x) will eventually keep converging to — oo. To help 
making problem (1) bounded we will add an origin encapsulating closed set to act as an additional bounding 
constraint. There is really no restriction as what this constraint needs to be. However, to facilitate concrete 
computations we will assume the most typical spherical constraint. One then has a reformulated version of 
(1) 

Z h {fh,A,B)=£ afc (f h ,f 1 ,A,B) =min 

X 

subject to 



A(x) 

Ax = 
Bx < 

/i(x) = ||x|| 2 - 1 < 0. (50) 
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Now, the mechanism of Section 2. 1 can be used. A way to provide a lower bound based on such a mechanism 
is to determine a quantity (/) sucn that below is non-negative with overwhelming probability. 

L^min /(x)-$(/) 

subject to g T x + ||x|| 2 ((l - e ( j m) )y / m 1 + m 2 /2 - af' 1 Vn) < 

||x|| 2 < 1. (51) 



Of course, the larger £d (/) is the harder for to stay non-negative. So, roughly speaking, the best 
£e> (/) wou ld be the one that makes equal to zero (or to be more precise, the one that makes stay 
just above zero). When £h(fh> A,B)<0 The optimization problem in (51) can be simplified a bit 

4 1 )=min /(x)-$(/) 

subject to g T x + ((1 - ej m) ) v / m 1 +m 2 /2 - e[f ) y/n) < 

||x|| 2 < 1. (52) 

(Throughout the presentation in the rest of this section we pretty much ignore scenario when there is no x 
such that £h(fhiA, B) < 0, since in that case one trivially has £h(fh, A B) = 0.) On the other hand, if 
one sets /i(x) = ||x|| 2 — 1 and /h(x) is such that (48) holds then one can also utilize the mechanism of 
Section 2.2. A way to provide an upper bound on £h{fh, A B) based on such a mechanism is to determine 
a quantity (/) such that below is non-positive with overwhelming probability. 

E#>=min / fc (x) -#>(/) 

subject to g T x + ||x|| 2 ((l + e^y/m! + m 2 /2 + e ( 5 g) y/n~) < 

||x|| 2 < 1. (53) 

Of course, the smaller ^ (/) is the harder for to stay non-positive. Again, roughly speaking, the best 

Z-oif) would be the one that makes equal to zero (or to be more precise, the one that makes stay 
just below zero). The optimization problem in (53) can be simplified a bit 

U^=unn / fc (x) -#>(/) 

subject to g T x + ((1 + e ( j m) )y/m 1 +m 2 /2 + ef' l y/n) < 

||x|| 2 < 1. (54) 

Of course, roughly speaking (basically ignoring all e's), for all practical purposes programs (52) and (54) are 
equivalent, which essentially means that if //i(x) is such that (48) holds then not only will £p (/) be a lower 
bound on Ch(fh, A B) with probability 1 as n — > 00, but also its a small variation will be an upper 

bound on £h{fh, A, B) with probability 1 as n — > 00. Or in other words, the probability that £h{fh, A B) 
will substantially deviate away from will go to zero as n — > 00. 

Now, to demonstrate how one would proceed further we will look at a couple of particular examples of 
homogeneous functions. 
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4.1 Purely linear f(x) 

We will first look at quite likely the simplest possible example for /(x), namely a purely linear function. 
So, we will set 



/, p (x) = ^3Ci. (55) 



Then (52) becomes 



i=i 

subject to g T x + ((1 - e^y/mi +m 2 /2 - e^y/n) < 

||x|| 2 < 1. (56) 

Also, to make writing easier we will set 

VOW = ((1 - ei m) ) V / «i + « 2 /2-6f ). (57) 
Now we rewrite (56) in the following more convenient way 

n 

Lg } = min max ^ x, + Ag T x + A VI>W ~ Z$ iflp) 
i=i 

subject to ||x||2 < 1. (58) 
Since the duality easily holds one then further has 



l£> = max min Vx^ + Ag T x + \VW) ^ - $ (f lp ) 

lp A>0 x — ' " 

i=l 

subject to ||x|| 2 < 1. (59) 



After solving the inner minimization we finally have 



L\l ] = max(-||l + Ag T || 2 + xVd^M ~ Z { £(fi P ), (60) 

where 1 is the n-dimensional column vector of all ones. Now, clearly, is a random quantity. To 
completely understand its random behavior one would need to study it in full detail. However, since this 
paper is mostly concerned with a conceptual approach rather than with the details of particular calculations 
we will skip all unnecessary portions and focus only on the main results. To that end we will just mention 
without proving that concentrates around its mean with overwhelming probability (the proof of this 
fact is not hard; however we do feel that going into such details would sidetrack our exposition; instead we 
do mention that a great deal of details needed for proofs of this type can be found in e.g. [5, 6] as well as in 
many general probability type of references). Given all of this it is clear that to apply results of Lemma 2 it 
is then enough to compute EL^j and then choose ^(fi p ) such that EL^j > 0. When n is large one then 
has 

Elp-) e^tf \ 

lim — M~ = max(— ViTv + \VIm)- hm { !l p) , (6i) 



n A>o n-Kx> ,/n 
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which after solving over A gives 



lim 



! (-y/TZm-Jhn^^M, if D«<1 



- linin^oo is-Wsl > otherwise 



^ ~ (62) 



Now if we recall on the definition of from (57) and set 

e g) (/jp) = /-a/1 - ((1 - ei m) )^i + a 2 /2 - efYyfc, if ((l-eS m) )V«i+«2/2-e? ) ) 2 <l ) 
[0, otherwise 

(63) 

we then based on Lemma 2 and previous discussion have 

lim P(S h (f lp ,A,B) > H<§(fi p )) = 1, (64) 

n— too ' 

where ^{fi p ) is as in (63). 

Since for /z p (x) from (55) and /i(x) = ||x|| 2 — 1 (48) holds, one can now, analogously to (57), set 

V^W = ((1 + e^V"! + « 2 /2 + ejf } ), (65) 
and write the following analogue to (58) 

n 

^ = min max Vx 8 - Ag T x + A\/>5w ^ - (/, ) 

lp x A>0 L — ' 

i=l 

subject to ||x|| 2 < 1. (66) 
After repeating previous arguments and relying on Lemma 4 one then arrives at 

lim P(£ h (fi p ,A, B) < &\fi P )) = 1, (67) 

n— >co 

where £,p\fi p ) would analogously to (63) be 



£%\f lp ) = J " V 1 " ((! + ^V"! + «2/2 + ef) 2 ^, if ((1 + ej™ V«i + «s/2 + ef ) 2 < 1 
0, otherwise 



(68) 

We summarize the above presentation in the following convenient lemma. 

Lemma 5. Consider optimization problem in (50). Let /h(x) = /zp( x ) = Y17=i x «- Let A be an mi x n 
matrix with i.i.d. standard normal components. Let B be an m 2 x n matrix with i.i.d. standard normal 
components. Assume that n is large and that mi = a\n and m 2 = a 2 n where a\ and a 2 are constants 
independent of n. Let ^(fi p ) and ^(fi p ) be as in (63) and (68), respectively. Let e's in (63) and (68) be 
arbitrarily small constants independent of n. Then, 

lim P^SUlp) < Zh(fl P ,A,B) < $\f lp )) = 1, 

n— >oo 

Proof. Follows from previous discussion. □ 
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Table 1: Experimental results for (50); ol\ = 0.5; (50) was run 1000 times with n = 200 



«2 


0.5 


0.6 


0.7 


0.8 


0.9 


1 




-0.4979 


-0.4433 


-0.3792 


-0.3040 


-0.2044 


-0.0723 


lim n ^ oc ^- (th.) 


-0.5000 


-0.4472 


-0.3873 


-0.3162 


-0.2236 


-0.0000 



More informally, assume the setup of Lemma 5. If 1 — a.\ — a%/2 0, one then has that with very 
low probability the optimal value of the objective function in (50), £,h{fi p ,A,B), would deviate from 
— y/l — a\ — ct2/2yjn. On the other hand if 1 — a\ — a 2 /2 < then with very high probability the 
optimal value of the objective function in (50), £h{flp, A B), is zero. 



4.1.1 Numerical example 

To give a bit more flavor as to how useful practically would be the results from the previous subsection, 
we conducted a limited set of numerical experiments. Namely, we solved problem (50) with A and B as 
randomly generated i.i.d. Gaussian matrices and /^(x) = // p (x) = Ya=i x * - ^ e re P eate d our experiment 
a number of times with different (but of course random) A and B. The results we obtained are summarized 
in Table 1. The second row contains the numerical values obtained through the simulations and the third 
row contains the numerical values that the above theory predicts. As can be seen from Table 1, even for a 
fairly small value of n one has a solid agreement between what the above theory predicts and the results 
obtained through numerical experiments. The results we presented in Table 1 are given for the expected 
values whereas Lemma 5 gives a probabilistic type of behavior. However, as we mentioned earlier, all 
important quantities do concentrate and they do concentrate around their mean values. 



4.2 General linear f(x) 

We will now extend a bit the results from the previous subsection. Namely, instead of looking at a purely 
linear function /(x) we will look at general linear functions. So, we will set 

n 

fA*) = ^2 c i*i = cTx ' ( 69 ) 
1=1 

where c is a deterministic (fixed) n x 1 vector from R n . For concreteness we will also set C g \ = and 
assume C g \ < oo as n — > oo. As in previous subsection one can then consider 

n 

1=1 

subject to g T x + ((1 - e( n) ) y / m 1 +m 2 /2 - y/n) < 

||x|| 2 < 1. (70) 

After repeating all the steps from the previous subsection one then arrives at 

Jim —4- = max(- A /c , 2 + A 2 + Av 7 !^)) - lim (71) 

n^oo ^pri A>0 V n-5>oo 
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which after solving over A gives 



lim 



n->oo yjn 



■CpjV^TD -limn-Hx,^), if D<1 
limn-wo n j^' , otherwise 



One can then repeat all remaing arguments from the previous subsection to arrive at the following (more 
general) analogue of Lema 5. 

Lemma 6. Consider optimization problem in (50). Let Sh( x ) = fgi( x ) = Y17=i c * x *> wnere c ?J a deter- 
ministic (fixed) n x 1 vector from R n . Set C g \ = and assume C g \ < oo as n — > oo. Let A be an rri\ x n 
matrix with Ltd. standard normal components. Let B be an rri2 x n matrix with Ltd. standard normal 
components. Assume that n is large and that m\ = a±n and m<i = «2 n where ol\ and a-i are constants 
independent of n. Let ${f 9 i) = C g i^(fi p ) and C^Ugi) = CgiCoU'lp) where ${fi p ) and CoUlp) are 
as in (63) and (68), respectively. Let e's in (63) and (68) be arbitrarily small constants independent of n. 
Then, 

KmP(Z<g(f gl ) < Mfgi,A,B) < &\f g i)) = 1, 



Proof. Follows from previous discussion. □ 

Remark: Knowing results of Lemma 5 one can deduce Lemma 6 even faster. For example, one can observe 
that f g i(x) = c r x = C 9 /l T (5 c x where Q c is an n x n matrix such that Q^Qc = I- Then (50) with 
A(x) = f g i(x) becomes 

Uh, A, B) = Cafdfh, Si, A, B) = min C fl ,l T Q c x 

X 

subject to AQ^QcX. = 
BQIQ c x < 

/i(x) = ||Q c x|| 2 - 1 < 0. (73) 
After a change of variables A rot = AQ^, B rot = BQ^, and x rot = Q c x one further has 

Ch(Sh,A,B) = £ a fc{Sh,Si,A,B) = min C gl l T x rot 

X-rot 

Subject tO A ro t^.rot = 
Brot^rot < 

/i(x) = \\x ro th - 1 < 0. (74) 

Now, observing that due to rotational invariance of Gaussian distribution matrices A rot and B rot are again 
comprised of i.i.d. standard normals one effectively has the same optimization problem as in the previous 
subsection. The only difference is that the objective function is multiplied by C g \ which is exactly what 
Lemma 6 states should be the case. 

4.3 A more general homogeneous f(x) 

In this subsection we will look at a more general homogeneous function /^(x). Namely, we will set 

n— k n 

= /fep(x) = N + Xi ' ( 75) 

i=l i=n— k+1 
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where k = (3n and (3 < a\ is a constant independent of n. This function is an interesting choice for at 
least three reasons. First, it appears as a very important object in studying sparse solutions of random under- 
determined linear systems of equations. Second, it is a function for which (48) holds. And third, it has a 
nice structure that allows one to actually analytically compute £^ (and since (48) holds then ^ as well). 
We will below closely follow the presentation of Section 4.1. To that end we start with (52) which in the 
case of interest here simplifies to (we are again mostly concern with the scenario where £,h(fh, A B) = 
Ufb P ,A,B)<0) 

n— fc+1 n 

4p= m x in Yl l Xi ' + Yl Xi-Zr>(fbp) 

i=l i=n— fc+1 

subject to g T x + ((1 - e[ m) ) y / m 1 +m 2 /2 - e { 5 g) y/n) < 

||x|| 2 < 1. (76) 



Also, to make writing easier, as in Section 4.1, we will use from (57). Now we rewrite (76) in the 

following more convenient way 

n— fc+1 n 

L^=minmax £ |x,| + £ x, + Ag T x + - 

i=l i=n—k+l 

subject to ||x||2 < 1. (77) 
Since the duality easily holds one then further has 

n— fc+1 n 

4J=maxmin V N + V Xj + Ag T x + A x/l^Vre — (fbp) 

i=l i=n—k+l 

subject to ||x||2 < 1. (78) 
After solving the inner minimization we finally have 



A>0 



l£> = max(- J\\(l n - k - A|g 1:n _ fc |)_||i + ||(l fc + Ag n _ fe+1:n ||| + - &Hfbp) 



= max{9-\- J\\(ei n - k ~ |gl:n-fc|)-||l + \\{91k + gn-fc+i^li + V^V^)) - t { 3(fbp), 

(79) 

where l n -k and are the n — k- and fc-dimensional column vectors of all ones respectively. Also, gi- n -k 
and gn-fc+i:n-fc ^ vectors comprised of first n — k and last k components of g, respectively. Vector 
(l n -k — A|gi :n _fe|)_ is a vector comprised only of negative components of vector (l n -k — A|gi :n _jt|) 
and analogously vector (91 n _k — |gi:n-fc|)- is a vector comprised only of negative components of vector 

(91 n -k - |gl:n-fc|)- 

Now, clearly, L^J is a random quantity. To completely understand its random behavior one would 
need to study it in full detail. However, since this paper is mostly concerned with a conceptual approach 
rather than with the details of particular calculations we will, as in Section 4.1, skip all unnecessary portions 
and focus only on the main results. To that end we will just mention without proving that L^J concentrates 
around its mean with overwhelming probability (the proof of this fact needs some work but it is conceptually 
easy; a majority of the details needed for the proof can be found in e.g. [5, 6]). Given all of this it is clear 
that to apply results of Lemma 2 it is then enough to compute EL^J and then choose £^ {fbp) sucn that 
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EL$ > 0. When n is large one then has 



lim E^l = max(^- 1 (-,/ 2(1 ^_ /3) - / ° (6 + z)e-*lHz + + 9 2 ) + V^(0)) - lim ^fc! 



(80) 

After solving the integral one further has 

EL^ I Op- 02 / 2 (0 2 4-1) f) 

lim = max(e- 1 (-W2(l - (3)(- _ + 1 erfc(— )) + 0(1 + fl 2 )+\AD«))- lim ^_ 



(81) 



Let 



0(0(0) = (9-\-^2(l - P)(- 0J ^f + ^f^erfc(A)) + + 02) + ^(0)), (82 ) 

and 

^ =max 0(O(m. (83) 



Then one has 



^4? 

lim 



Now we set 



^(0(5(0) -lin^^^i^, if max e>o 0«(0)<O 
- lim,^ , otherwise 



(84) 



e (') (/6)= JV'H^Vn, if max e>o ^W<0 (g5) 
0, otherwise 



We then based on Lemma 2 and previous discussion have 

Jim P(a(/6p, -A, B) > $ (f bp )) = 1, (86) 

where obviously £^ (/^p) is as in (85). 

Since for fbp(x) from (55) and /i(x) = ||x|| 2 — 1 (48) holds, one can now make use of Lemma 4 to in 
a way upper-bound £,h(fbpi A B). One starts with writing the following analogue to (58) 

n— k n 

[7^= mm max £ |x,| + £ x, + Ag T x + Av^v^ - 

subject to ||x||2 < 1. (87) 
After repeating previous arguments and relying on Lemma 4 one then arrives at 

lim P(Z h (f bp ,A,B) < Ztfifbp)) = 1, (88) 

n— too 

where £p\fbp) would analogously to (85) be 

p\f b )= J^)^, if max, >o 0(«)(^)<O ^ 
I 0, otherwise 
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£>( u ) would be as in (65), and ^ u \6) and 9^ would analogously to (82) and (83) be 

4>(-)(9) = (9~\-^2{l - /3)(-^=^ + ^f^erfc(-|)) + 0(1 7**) + V^H)), (90) 



and 

# u ) = max 0^(0). (91) 

<9>0 

We summarize the above presentation in the following convenient lemma. 

Lemma 7. Consider optimization problem in (50). Let /^(x) = /fe p (x) = X^i* l x «l + J27=n-k+i x *- ^ 
A be an mi x n matrix with Ltd. standard normal components. Let B be an m.2 x n matrix with Ltd. 
standard normal components. Assume that n is large and that m\ = a±n and 777,2 = ctin where ot\ and a<i 
are constants independent ofn. Let D^ l \ D^ u \ 4> {l) {9), 0(0 4> {u) (9), and 0» be as in (57), (65), (82), (83), 
(90), and (91), respectively. Further, let ^(fbp) and Co\fbp) be as in (85) and (89), respectively. Let e's 
in (57) and (65) be arbitrarily small constants independent ofn. Then, 

lim P^Sifbp) < Uh P ,A,B) < ^\f bp )) = 1, 

n— >oo 



Proof. Follows from previous discussion. □ 

Remark: Taking functional equation 4>^(9^) (or ^>i u ){9^)) and equalling it with zero would give the 
critical dependence for j3, a\, and Q2 so that (50) has negative optimal value of the objective function with 
probability that goes to 1 as n — > 00. In fact, this (with Q2 — > 0) is precisely what was done in [5, 6] to 
obtain the critical threshold for success of i\ optimization in recovering sparse solutions of random under- 
determined linear systems of equations (of course in [5, 6] we were strictly interested in characterizing the 
critical threshold and properties of (76) in an as explicit way as possible and conducted a substantial further 
massage of (90) and (91) which we clearly skip here). 



4.3.1 Numerical example 

As in Subsection 4.1, to give a bit more flavor as to how useful practically would be the results from the 
previous subsection, we conducted a limited set of numerical experiments. Namely, we solved problem 
(50) with A and B as randomly generated i.i.d. Gaussian matrices and /^(x) = /f, p (x) = Y^=i l x *l + 
Yl^n-k+i x «- W e again repeated our experiment a number of times with different (but of course random) 
A and B. The results we obtained are summarized in Table 2. The second row contains the numerical 
values obtained through the simulations and the third row contains the numerical values that the above 
theory predicts. As can be seen from Table 2, even for a fairly small value of 77, one as in Subsection 4.1, 
has a solid agreement between what the above theory predicts and the results obtained through numerical 
experiments. The results we presented in Table 2 are given for the expected values whereas Lemma 7 gives 
a probabilistic type of behavior. However, as we mentioned earlier, all important quantities do concentrate 
and they do concentrate around their mean values. 



5 Conclusion 

In this paper we looked at classic linearly constrained optimization problems. We viewed them in a statistical 
context. We provided a general way of characterizing their optimal values. More specifically, we provided 
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Table 2: Experimental results for (50); a.\ = 0.5, «2 = 0.5; (50) was run 1000 times with n = 200 





0.42 


0.5 


0.6 


0.7 
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0.9 


1 


p r- (sim.) 


-0.0265 


-0.0904 


-0.1797 


-0.2645 


-0.3470 


-0.4242 


-0.4979 


lim^oo E(L »^ D ] - (th.) 


-0.0189 


-0.0936 


-0.1825 


-0.2672 


-0.3481 


-0.4256 


-0.5000 



a generic strategy that can help create a lower-bound on the optimal value of the objective function. The 
strategy is based on transforming the original problem to its a simpler probabilistic alternate. On the other 
hand for a specific type of objective function we were then able to create an analogous strategy that can 
help create an upper-bound on the optimal value of the objective function. Moreover, probabilistically 
speaking the two bounds match which essentially means that the lower-bounding strategy (which works 
for any objective function) in certain scenarios is actually good enough to optimally characterize the entire 
problem. 

We then mentioned that the presented framework is fairly powerful and presented ways how one can 
modify it to cover various other optimization problems. Still, the modifications that we presented are fairly 
simple and we chose to present them just to give an idea how relatively easy is to use the presented strate- 
gies. Of course a whole lot more can be done, i.e. the class of optimization problems where the strategies 
presented here will work is much wider then a few examples that we presented. However, since this is an 
introductory paper where we intended just to present the core concepts of a much bigger theory we skipped a 
detail discussion as to what the limits of our propositions are. Also, many of further modifications/extensions 
are typically problem specific and we thought that it is better to cover them separately and present such a 
coverage elsewhere. 

What is also important to stress is that we viewed optimization problems in a statistical context. To be 
more precise, we assumed a typical Gaussian scenario where all random quantities in any of our problems 
are assumed to be i.i.d. standard normals. These assumptions substantially simplified the exposition but 
are not really necessary. In fact, all results presented here would actually hold for a fairly large class of 
random distributions. Proving that is not that hard. In fact there are many ways how it can be done, but 
typically would boil down to repetitive use of the central limit theorem. For example, a particularly simple 
and elegant approach would be the one of Lindeberg [4]. Adapting our exposition to fit into the framework 
of the Lindeberg principle is relatively easy and in fact if one uses the elegant approach of [2] pretty much a 
routine. Since we did not create these techniques we chose not to do these routine generalizations. However, 
to make sure that the interested reader has a full grasp of generality of the results presented here, we do 
emphasize again that pretty much any distribution that can be pushed through the Lindeberg principle would 
work in place of the Gaussian one that we used. 

Since the theory that we presented above in a way establishes a random duality we decided to call it 
that way. Along the lines of the above mentioned probabilistic generality of our theory, we then coined 
the term regularly random duality where under regularly random we essentially view any randomness that 
eventually in large dimensional settings boils down to Gaussian. It is quite possible that there are other 
classes of randomness for which similar theories can be built. While they may not be as powerful as the 
Gaussian one it would certainly (at least from a mathematical point of view) be interesting to see what their 
shapes and forms are. 
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